Skip to content

Fix issues with chat template application on responses requests #4173

Merged
michalkulakowski merged 20 commits into
mainfrom
mkulakow/responses_fix_2
May 19, 2026
Merged

Fix issues with chat template application on responses requests #4173
michalkulakowski merged 20 commits into
mainfrom
mkulakow/responses_fix_2

Conversation

@michalkulakowski
Copy link
Copy Markdown
Collaborator

🛠 Summary

JIRA/Issue if applicable.
Describe the changes.

🧪 Checklist

  • Unit tests added.
  • The documentation updated.
  • Change follows security best practices.
    ``

Copilot AI review requested due to automatic review settings April 30, 2026 12:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets improved OpenAI Responses API request handling in OVMS LLM serving so the existing Python/Jinja chat-template path can reliably consume Responses-format inputs (including tool-calling and reasoning-related items).

Changes:

  • Add debug logging before applying the Python/Jinja chat template.
  • Extend Responses input parsing to accept additional item/content shapes (reasoning summaries, tool-call items, missing/empty content, output_text).
  • Build a processedJson payload in chat/completions-style (messages + converted tools) for the Python/Jinja template path.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
src/llm/py_jinja_template_processor.cpp Adds a debug log of the incoming request body before applying the chat template.
src/llm/apis/openai_responses.cpp Enhances Responses API parsing and constructs chat/completions-compatible processedJson including tool conversion and tool-call merging.

Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/llm/py_jinja_template_processor.cpp Outdated
Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/llm/apis/openai_responses.cpp Outdated
@michalkulakowski michalkulakowski force-pushed the mkulakow/responses_fix_2 branch from 035101b to 3ac1cf1 Compare May 6, 2026 13:22
@dtrawins dtrawins added this to the 2026.2_rc milestone May 8, 2026
@michalkulakowski michalkulakowski force-pushed the mkulakow/responses_fix_2 branch 3 times, most recently from 773cb1a to 0aa9a48 Compare May 12, 2026 11:53
Comment thread src/llm/py_jinja_template_processor.cpp Outdated
return false;
}

SPDLOG_DEBUG("Before chat template: \n {}", requestBody);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to keep it?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/llm/servable.cpp Outdated
} catch (const std::exception& e) {
SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Failed to apply chat template: {}", e.what());
return absl::Status(absl::StatusCode::kInvalidArgument, "Failed to apply chat template. The model either does not have chat template or has an invalid one.");
return absl::Status(absl::StatusCode::kInvalidArgument, absl::StrCat("Failed to apply chat template: ", e.what()));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to expose call stack to the user?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michalkulakowski michalkulakowski force-pushed the mkulakow/responses_fix_2 branch from 2153f52 to 99b08c3 Compare May 13, 2026 11:23
mkdir -p ${HOME}/models
docker run -d --user $(id -u):$(id -g) --rm -p 8000:8000 -v ${HOME}/models:/models --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) openvino/model_server:weekly \
--rest_port 8000 --model_repository_path /models --source_model Junrui2021/Qwen3-VL-8B-Instruct-int4 --tool_parser hermes3 --target_device GPU --task text_generation --pipeline_type VLM_CB --allowed_media_domains raw.githubusercontent.com
--rest_port 8122 --model_repository_path /models --source_model Junrui2021/Qwen3-VL-8B-Instruct-int4 --model_name ovms-model --tool_parser hermes3 --target_device GPU --task text_generation --pipeline_type VLM_CB --allowed_media_domains raw.githubusercontent.com
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is unintended

@michalkulakowski michalkulakowski force-pushed the mkulakow/responses_fix_2 branch from ed876b5 to 6a0baef Compare May 14, 2026 09:44
@michalkulakowski michalkulakowski requested a review from przepeck May 14, 2026 11:48
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/test/http_openai_handler_test.cpp:3902

  • Same issue as parseResponses(): EXPECT_FALSE on parse errors is non-fatal, so the helper can proceed to construct/parse a handler against an invalid document and return a status that may not reflect the original failure clearly. Consider using ASSERT_FALSE for JSON parsing failures (or returning an explicit InvalidArgument status when doc.HasParseError() is true).
    doc.Parse(json.c_str());
    EXPECT_FALSE(doc.HasParseError()) << json;
    std::optional<uint32_t> maxTokensLimit;
    uint32_t bestOfLimit = 0;
    std::optional<uint32_t> maxModelLength;
    auto apiHandler = std::make_shared<ovms::OpenAIResponsesHandler>(
        doc, ovms::Endpoint::RESPONSES, std::chrono::system_clock::now(), tokenizer);
    return apiHandler->parseRequest(maxTokensLimit, bestOfLimit, maxModelLength);

Comment thread src/llm/visual_language_model/continuous_batching/servable.cpp Outdated
Comment thread src/llm/apis/openai_responses.cpp
Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/test/http_openai_handler_test.cpp
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (2)

src/test/http_openai_handler_test.cpp:3915

  • parseResponses() returns nullptr after ADD_FAILURE() when parsing fails; downstream helpers/tests (e.g., expectResponsesEquivalentToChatCompletions) dereference the returned pointer without asserting non-null, which can crash the test and obscure the real failure. Add an ASSERT_NE(apiHandler, nullptr) at the start of consumers or refactor the helper to fail fatally.
    doc.Parse(json.c_str());
    if (doc.HasParseError()) {
        ADD_FAILURE() << "Failed to parse JSON: " << json;
        return nullptr;
    }

src/llm/apis/openai_responses.cpp:550

  • ProcessedJsonSink::emitStandaloneReasoning() omits the "content" field entirely. Templates that access message.content unconditionally will fail on such messages. Emit "content": "" (empty string) for standalone reasoning turns to keep processedJson compatible with a wider set of templates.
    void emitStandaloneReasoning(const std::string& reasoning) {
        rapidjson::Value msgObj(rapidjson::kObjectType);
        msgObj.AddMember("role", rapidjson::Value("assistant", alloc), alloc);
        msgObj.AddMember("reasoning_content", rapidjson::Value(reasoning.c_str(), alloc), alloc);
        messagesArray.PushBack(msgObj, alloc);

Comment thread src/test/http_openai_handler_test.cpp
Comment thread src/test/http_openai_handler_test.cpp
Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/llm/apis/openai_responses.cpp
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/llm/servable.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread src/llm/servable.cpp Outdated
@michalkulakowski michalkulakowski requested a review from Copilot May 18, 2026 09:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/llm/apis/openai_responses.cpp
Comment thread src/llm/apis/openai_responses.cpp
Comment thread src/llm/apis/openai_responses.cpp Outdated
Comment thread src/llm/visual_language_model/continuous_batching/servable.cpp
@michalkulakowski michalkulakowski requested a review from Copilot May 18, 2026 09:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment thread src/llm/visual_language_model/continuous_batching/servable.cpp
Comment thread src/llm/apis/openai_responses.cpp
Comment thread src/llm/apis/openai_responses.cpp
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

src/llm/apis/openai_responses.cpp:660

  • parseInput() logs “Parsed responses input … without mutating request JSON”, but parseResponsesPart() now mutates doc before calling parseInput (tools normalization and potentially adding chat_template_kwargs). This debug message is misleading; consider rewording it or moving the tools normalization to after parseInput so the statement remains true.
    } else {
        return absl::InvalidArgumentError("input is not a string or array");
    }

    SPDLOG_LOGGER_DEBUG(llm_calculator_logger, "Parsed responses input directly to chat history without mutating request JSON");
    return absl::OkStatus();

src/test/http_openai_handler_test.cpp:708

  • This comment says “For Responses, processedJson is always built from chatHistory”, but the updated Responses implementation now builds processedJson directly from the original input (including reasoning/function_call handling) rather than re-serializing chatHistory. Please update the comment to reflect the current contract to avoid confusion when maintaining these tests.
    ASSERT_NE(apiHandler, nullptr);

    // For Responses, processedJson is always built from chatHistory.
    // For chat/completions with simple text, processedJson is empty (original body is used instead).
    // In both cases, the chatHistory should be equivalent.

Comment thread src/llm/apis/openai_responses.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread src/llm/apis/openai_responses.cpp Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

@michalkulakowski michalkulakowski merged commit f386355 into main May 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants